An Ultra-fast Approach to Align Longer Short Reads onto Human Genome

نویسندگان

  • Arup Ghosh
  • Gi-Nam Wang
  • Satchidananda Dehuri
چکیده

With the advent of second-generation sequencing (SGS) technologies, deoxyribonucleic acid (DNA) sequencing machines have started to produce reads, named as “longer short reads”, which are much longer than previous generation reads, the so called “short reads”. Unfortunately, most of the existing read aligners do not scale well for those second-generation longer short reads. Moreover, many of the existing aligners are limited only to the short reads of previous generation. In this paper, we have proposed a new approach to solve this essential read alignment problem for current generation longer short reads. Our ultra-fast approach uses a hash-based indexing and searching scheme to find exact matching for second-generation longer short reads within reference genome. The experimental study shows that the proposed ultra-fast approach can accurately find matching of millions of reads against human genome within few seconds and it is an order of magnitude faster than Burrows-Wheeler Transform (BWT) based methods such as BowTie and Burrows-Wheeler Aligner (BWA) for a wide range of read length.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SOAP3: GPU-based Compressed Indexing and Ultra-fast Parallel Alignment of Short Reads

As the cost efficiency of the next generation DNA sequencing technology keeps improving, there is an ever-increasing demand for high-throughput software to align the enormous number of short reads (patterns) with reference genomes (such as the human genome). In the past few years, a number of very fast alignment software (e.g., Maq, SOAP2, ZOOM, Bowtie, BWA) have been developed; most of them ar...

متن کامل

SOAP3: ultra-fast GPU-based parallel alignment tool for short reads

SOAP3 is the first short read alignment tool that leverages the multi-processors in a graphic processing unit (GPU) to achieve a drastic improvement in speed. We adapted the compressed full-text index (BWT) used by SOAP2 in view of the advantages and disadvantages of GPU. When tested with millions of Illumina Hiseq 2000 length-100 bp reads, SOAP3 takes < 30 s to align a million read pairs onto ...

متن کامل

Fast and accurate short read alignment with Burrows–Wheeler transform

MOTIVATION The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for sin...

متن کامل

SRmapper: a fast and sensitive genome-hashing alignment tool

UNLABELLED Modern sequencing instruments have the capability to produce millions of short reads every day. The large number of reads produced in conjunction with variations between reads and reference genomic sequences caused both by legitimate differences, such as single-nucleotide polymorphisms and insertions/deletions (indels), and by sequencer errors make alignment a difficult and computati...

متن کامل

Faster and More Accurate Sequence Alignment with SNAP

We present the Scalable Nucleotide Alignment Program (SNAP), a new short and long read aligner that is both more accurate (i.e., aligns more reads with fewer errors) and 10–100× faster than state-of-the-art tools such as BWA. Unlike recent aligners based on the Burrows-Wheeler transform, SNAP uses a simple hash index of short seed sequences from the genome, similar to BLAST’s. However, SNAP gre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Informatica (Slovenia)

دوره 37  شماره 

صفحات  -

تاریخ انتشار 2013